library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.6 ✓ dplyr 1.0.7
✓ tidyr 1.1.4 ✓ stringr 1.4.0
✓ readr 2.1.0 ✓ forcats 0.5.1
── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Do cars with big engines use more fuel than cars with small engines?
mpg data framempg
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
1. Run ggplot(data = mpg). What do you see?
ggplot(data = mpg)
2. How many rows are in mpg? How many columns?
# rows
nrow(mpg)
[1] 234
# columns
ncol(mpg)
[1] 11
3. What does the drv variable describe? Read the help for ?mpg to find out?
glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi"…
$ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "a4 quattro", "a4 quattro", "a4 quattr…
$ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 3.1, 2.8, 3.1, 4.2, 5.3,…
$ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 2008, 2008, 1999, 1999, 2008, 2008, 1999…
$ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8…
$ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto(l5)", "manual(m5)", "auto(av)", "man…
$ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "r",…
$ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 15, 15, 17, 16, 14, 11, 14, 13, 12, 16…
$ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 25, 24, 25, 23, 20, 15, 20, 17, 17, 26…
$ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "r",…
$ class <chr> "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compa…
4. Make a scatterplot of hwy vs cyl.
ggplot(data = mpg, aes(x = cyl, y = hwy)) +
geom_point()
5. What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
ggplot(data = mpg, aes(x = class, y = drv)) +
geom_point()
ggplot(data = mpg, aes(x = displ, y = hwy, color = class)) +
geom_point()
ggplot(data = mpg, aes(x = displ, y = hwy, size = class)) +
geom_point()
Warning: Using size for a discrete variable is not advised.
# Left
ggplot(data = mpg, aes(x = displ, y = hwy, alpha = class)) +
geom_point()
Warning: Using alpha for a discrete variable is not advised.
# Right
ggplot(data = mpg, aes(x = displ, y = hwy, shape = class)) +
geom_point()
Warning: The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to
discriminate; you have 7. Consider specifying shapes manually if you must have them.
Warning: Removed 62 rows containing missing values (geom_point).
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(color = "blue")
1. What’s wrong with this code? Why are the points not blue?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
The manual color setting needs to be outside of the aes argument.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
2. Which variables in mpg are categorical? Which variables are continuous? How can you see this information when you run mpg?
glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi"…
$ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "a4 quattro", "a4 quattro", "a4 quattr…
$ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 3.1, 2.8, 3.1, 4.2, 5.3,…
$ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 2008, 2008, 1999, 1999, 2008, 2008, 1999…
$ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8…
$ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto(l5)", "manual(m5)", "auto(av)", "man…
$ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "r",…
$ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 15, 15, 17, 16, 14, 11, 14, 13, 12, 16…
$ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 25, 24, 25, 23, 20, 15, 20, 17, 17, 26…
$ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "r",…
$ class <chr> "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compa…
3. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = year))
Error: A continuous variable can not be mapped to shape
Run `rlang::last_error()` to see where the error occurred.
4. What happens if you map the same variable to multiple aesthetics?
5. What does the stroke aesthetic do? What shapes does it work with?
6. What happens if you map an aesthetic to something other than a variable name, like aes(color = displ < 5)? Note, you’ll also need to specify x and y.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ cyl)
1. What happens if you facet on a continuous variable?
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ cty)
It facets along all combinations of the variable.
2. What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
It means that there is no data in the combination of variables.
3. What plots does the following code make? What does the . do?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
The . allows the user to specify facets by rows or columns.
4. Take the first faceted plot in this section:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
What are the advantages to using faceting instead of the color aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?
5. Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments?
6. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?
1. What deom would you use to draw a line chart?
geom_line()
A boxplot?
geom_boxplot()
A histogram?
geom_histogram()
An area chart?
geom_area()
2. Run this code in your head and predict what the output will look like. Then, rune the code in R and check you predictions.
This code will produce a scatterplot with a fitted line.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
3. What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?
show_legend supresses the legend mappings.
4. What does the se argument to geom_smooth() do?
It contols the standard error shading in the plot.
5. Will these two graphs look different? Why/why not?
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
No, they will produce the same plot.
6. Recreate the R code necessary to generate the following graphs.
(plot1 + plot2) / (plot3 + plot4) / (plot5 + plot6)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'